Entry Name:  VADER-MC2

VAST Challenge 2015
Mini-Challenge 2

 

 

Team Members:

Yifan Zhang, yifan.zhang@asu.edu

Xing Liang, xliang22@asu.edu

Michael Steptoe, msteptoe@asu.edu

Sagarika Kadambi, sskadamb@asu.edu

Wei Luo, wluo23@asu.edu

Dawei Zhou, dawei.zhou@asu.edu

Hanghang Tong, htong6@asu.edu

Jingrui He, jingruih@asu.edu

Ross Maciejewski, rmacieje@asu.edu

 

Student Team:  Yes

 

Did you use data from both mini-challenges?  Yes

 

Analytic Tools Used:

https://www-complexnetworks.lip6.fr/~latapy/PP/walktrap.html

 

Approximately how many hours were spent working on this submission in total?

240 hours between the students

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete? Yes

 

Video Download

Video:https://youtu.be/VGXrxlxstt4

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Questions

 

MC2.1Identify those IDs that stand out for their large volumes of communication.  For each of these IDs

 

      a.      Characterize the communication patterns you see.

      b.      Based on these patterns, what do you hypothesize about these IDs?

 

Limit your response to no more than 4 images and 300 words.

 

We used a calendar view (16 hours vertically x 60 minutes horizontally) and histogram to identify IDs with large volumes of communications. We categorize the volume of communications into four types: sent, received, external, and unique (i.e., if a visitor sends multiple communication at the same time, it will only count once in the unique volume). We can explore the sent message histogram and interactively select bins that have a high volume of messages to extract the IDs.  We find two IDs having high volumes of sent message:  1278894, 839736

p1

 

ID:1278894 has more than 189000 sent and received messages; however, the unique volume for this ID is only 180 and it did not have any external communications. By looking at the calendar view of this ID, we identify that it has high volume of communication every five minutes starting at 12PM and ending at 21:00 every day. In this case, we hypothesize that this ID is a park auto message bot that broadcasts announcements.

 

ID:839736 has almost identical sent and received message (over 60000) and has 0 external communications but also a large amount of unique communications.  Its calendar view does not indicate any obvious temporal patterns, so we hypothesize that this ID is a park ID that replies to visitors like an information service center.

 

After excluding these two IDs, we look into the other top IDs that have high amounts of communications. 

 

These 10 IDs have over 34 sent messages (more than received) and have very few external communications.  Their unique volume is one tenth of the sent volume. The calendar view for those IDs show that the high volume of communications are not smoothly distributed over time but burst out at certain time (ID:1045021) and all of them were in park all three days. Therefore, we hypothesize that those IDs are likely park staff.

 

p2

 

 

MC2.2Describe up to 10 communications patterns in the data. Characterize who is communicating, with whom, when and where. If you have more than 10 patterns to report, please prioritize those patterns that are most likely to relate to the crime.

 

Limit your response to no more than 10 images and 1000 words.

 

For this question, we linked the patron ID to the movement data and then marked the communication from the closest attraction in the park.  These locations are then marked in a calendar view and the number of To/From/External/Unique calls are shaded by magnitude in each cell.  We explored these locations to identify typical call patterns related to rides in the park.

 

Pattern 1: Messages at attraction 32-Creighton Pavilion and 63-Grinosaurus Stage.

The image below shows the communication peaks at attraction 32 and 63 on Friday. The peaks at attraction 63 happened between 09:30 am to 11:30 am and between 14:30 and 16:30, whereas the communication peaks at attraction 32 happened between 11:30 and 15:00.  From this pattern, we can observe when the soccer star’s show is over.  We can also infer that the pavilion is closed from 10:00 to 11:30 and from 15:00 to 16:30 as few communications are sent from there during these times.

 

Figure 1: Distribution of sent messages on Friday 8AM to 24:00.

 

 

Note that this pattern changes on Sunday.  There is only one communication peak at attraction 32 and attraction 63, which is different from the previous two days. The show at attraction 63 and display at attraction 32 are inferred to be canceled in the afternoon according to the communication amount over there. We speculate that it is the vandalism that causes the stop of the show, followed by less communications.

 

Figure 2:: Distribution of sent messages on Sunday from 8AM to 24:00.

 

Pattern 2: Park broadcasts.

We observe that some IDs broadcast large amounts of messages over a short period of time while others send messages intermittently. We hypothesize that the park uses a broadcast system to send messages to many patrons simultaneously. We assume that all messages sent by one ID within one minute are the same message and we provide a count of all unique messages in the data.  We can view the distribution of unique messages at each location using the calendar view and can observe the high volume of sent/received messages as well as the large amount of unique messages being sent. In the figure below, we show the total messages sent on Sunday and the right is the unique messages sent.  The unique messages sent at the pavilion are of the same magnitude as the total number of messages which means that at this time it is the visitors who sending the message which likely corresponds to the discovery of the vandalism.

Figure 3: Comparisons of the sent (left) and unique (right) messages.

 

We also support visualizing the number of messages sent to people external to the park.  Wecan see that the external amount of the messages at 12:00 to 12:30PMalso peak at this time, further indicating the vandalism was discovered

Figure 4: Comparisons of the external message amount.

 

Pattern 3: Criminal identification

Since we suspect the crime occurred at the Pavilion between 10:30 and 11:30AM, we explore the communication patterns of people located near the pavillion during this time.  First, we use the histogram to visualize those people who have external communications over that time. We can get a list of visitors ids: 461004,416790,1502920,611447,771453,416790,668872,921888,988181,1101361,1102394,1358860,1364488,1872848,1938686. 

 

Then we use our communication explorer tool (below) to explore those ids and found that three ids (461004,416790,1502920) have abnormal behaviors. They came into the park together  and then went to the Pavilion around 9:00am.  Over the next 30 minutes, they stayed there with few communications. However, they had a lot of communications after 09:40AM, which was very different from the previous 30 minutes until 12 pm.

 

Figure 5: Visualization for the visitors’ changed checked in attraction and communications. In this view, the x axis represents the time from 08:00 to 24:00 and the Y axis represents the attractions that the visitors are near. Circles in the plot represent the movement of the visitors from one attraction to another and the circle size encodes the number of patrons.  A circle consists only of patrons that have a connection in the communication data. In the lower left, we can see a small black circle show up at attraction 84, which means there are 1-2 people checked-in at that time. After a while, this group moved to attraction 81, a green line is drawn to emphasize his movement. By hovering over the line we show if patrons joined or left this group.  If the circle changes to red, this means that a person left the group at this attraction and if it changes to blue, it means that a person joined the group. Blue short lines represent the internal communications among the visitors in one attractions and the red short lines represent their external communications.  In this way we hope to explore how patrons that communicate with each other travel around the park.

 

To further trace the movement of the group (461004,416790,1502920), we visualize their attraction records and communication in the same graph. We can clearly see that most of time, they visit the same attractions and barely separate.

 

 

Figure 6: Group (461004,416790,1502920)’s check-ins and communications.

 

 

Another suspect is visitor 921888 who checked in at the Pavilion right after group (461004,416790,1502920), Figure 5. From the records, it is the first day that this patron came to the park, and he/she returned to the hotel at 12:10pm, Figure 7.

 

 

Figure 7: Visitor 921888’s check-ins and communications.

 

Pattern 4: High external call volumes

We also observed patterns of high external call volumes.

Figure 8: Histogram for the total external amount excluding the broadcasting IDs.

 

We explore this patron’s movement and communication pattern.We find that this person has a large amount of external messages at the Pavillion even though the Pavilion is closed.

Figure 9: Attraction and communication situation for the visitor 1711922.

 

Pattern 5: Visitor preferences

 

The figure below shows that there were more people sending messages out from the thrill rides, which implies that these rides have a large popularity.

 

Figure 10: The distribution of sending message amount on Sunday.

 

 

 

MC2.3 From this data, can you hypothesize when the crime was discovered?  Describe your rationale.

 

Limit your response to no more than 3 images and 300 words. 

 

From the data, we hypothesize that the crime was discovered on Sunday between 11:30 and 11:45AM. We found that all four types of communications (Sent, Received, External and Unique) around the Pavilion (attraction 32) on Friday and Saturday display a strong temporal pattern with high volume peaks from 8:30-10:00, 11:30-15:00, and 16:30-21:00 in the attraction calendar view. On Sunday, the first pattern from 8:30 to 10:00 existed, but the second pattern from 11:30 to 15:00 changed and the third pattern was gone.

p3

 

The sent volume is a few thousand messages from 11:30-12:00 on Friday and Saturday at Pavilion, but the volume reaches more than 29000 during that time on Sunday. We can also see that the volume of communication decreases significantly after 12:30 on Sunday and we hypothesize that the crime is discovered between 11:30 and 12:00.  Next we investigate the detail view of the communications from 11:30-12:00 by clicking that cell in the calendar.  A set of circles are displayed above the calendar, each colored based on the number of communications per minute during this half hour block. We can see that the external volume has a clear division at 11:45am. We hypothesize that at 11:38AM the crime was discovered by the staff and several announcements are broadcast, which result in an increased volume in sent and received messages at 11:39, 11: 41, and 11:44. After that, visitors start messaging their friends and staff may contact the authorities, which result in a significant increasing external volume at 11:45.

 

p4